Distributional Word Clustering in Parallel

نویسندگان

Alan L. Ritter

James W. Hearne

Philip A. Nelson

چکیده

We discuss various methods which have been applied to grouping words into syntactic and semantic categories, primarily how they deal with the problems of sparsity and computational complexity. We then present a method of distributional clustering, and discuss the parallelization of the most computationally intensive part of this process.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Information Theoretic Approach to Bilingual Word Clustering

We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the...

متن کامل

Resolving Translation Ambiguity Using Non-Parallel Bilingual Corpora

This paper presents an unsupervised method for choosing the correct translation of a word in context. It learns disambiguation information from nonparallel bilinguM corpora (preferably in the same domain) free from tagging. Our method combines two existing unsupervised disambiguation algorithms: a word sense disambiguation algorithm based on distributional clustering and a translation disambigu...

متن کامل

Automatically Discovering Word Senses

We will demonstrate the output of a distributional clustering algorithm called Clustering by Committee that automatically discovers word senses from text1.

متن کامل

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

The present paper deals with word sense induction from lexical co-occurrence graphs. We construct such graphs on large Russian corpora and then apply the data to cluster the results of Mail.ru search according to meanings in the query. We compare different methods of performing such clustering and different source corpora. Models of applying distributional semantics to big linguistic data are d...

متن کامل

Russian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations

The paper presents results on Russian named entities classification and equivalent named entities retrieval using word and phrase representations. It is shown that a word or an expression’s context vector is an efficient feature to be used for predicting the type of a named entity. Distributed word representations are now claimed (and on a reasonable basis) to be one of the most promising distr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Distributional Word Clustering in Parallel

نویسندگان

چکیده

منابع مشابه

An Information Theoretic Approach to Bilingual Word Clustering

Resolving Translation Ambiguity Using Non-Parallel Bilingual Corpora

Automatically Discovering Word Senses

Semantic Clustering of Russian Web Search Results: Possibilities and Problems

Russian Named Entities Recognition and Classification Using Distributed Word and Phrase Representations

عنوان ژورنال:

اشتراک گذاری